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Abstract 

Sybil attacks are becoming increasingly widespread, and 
pose a significant threat to online social systems; a sin¬ 
gle adversary can inject multiple colluding identities in 
the system to compromise security and privacy. Recent 
works have leveraged the use of social network-based 
trust relationships to defend against Sybil attacks. How¬ 
ever, existing defenses are based on oversimplified as¬ 
sumptions, which do not hold in real world social graphs. 

In this work, we propose SybilFrame, a defense-in¬ 
depth framework for mitigating the problem of Sybil at¬ 
tacks when the oversimplified assumptions are relaxed. 
Our framework is able to incorporate prior information 
about users and edges in the social graph. We vali¬ 
date our framework on synthetic and real world network 
topologies, including a large-scale Twitter dataset with 
20M nodes and 265M edges, and demonstrate that our 
scheme performs an order of magnitude better than pre¬ 
vious structure-based approaches. 

1 Introduction 

Our systems today are vulnerable to Sybil attacks, in 
which an attacker injects multiple fake accounts into the 
system to compromise security and privacy Q. Re¬ 
cently, the increasing popularity of online social net¬ 
works have made them attractive targets for Sybil at¬ 
tacks. It is estimated that tens of millions of Sybil ac¬ 
counts exist in popular social networks such as Twitter 
and Facebook m a- Attackers can leverage Sybil ac¬ 
counts to compromise system security via propagating 
social malware, as well as system privacy via learning 
users’ private information 121-141 

An important thread of research proposes to mitigate 
Sybil attacks using social network-based trust relation¬ 
ships. The key insight of this line of defense is that it is 
hard for attackers to establish trust relationships with be¬ 
nign users. That is, the number of edges between benign 
users and Sybil identities (called attack edges) is limited. 
Systems such as SybilGuard O, SybilLimit El, Sybil- 


Infer Q, SybilRank fH, and SybilBelief 0 exploit the 
limited number of attack edges to detect Sybil identities 
using graph-theoretic techniques. 

While these systems as well as related works have pi¬ 
oneered the use of social network structure for Sybil de¬ 
fense, the actual deployment of these ideas in real world 
networks remains controversial. Yang et al. M showed 
that network structure-based Sybil defenses failed in 
identifying Sybil accounts in RenRen, a popular social 
network in China. This is because structure-based de¬ 
fense mechanisms make assumptions of strong trust re¬ 
lationships between users, such that the number of at¬ 
tack edges is limited 0-0. These assumptions do not 
hold in networks with weak trust relationships, which 
enables an adversary to create a large number of attack 
edges. Ghosh et al. ifTTIl showed that on Twitter, a link 
farming phenomenon is wide spread and poisonous, in 
which certain benign accounts blindly accept follow re¬ 
quests. Thus, in such weak-trust social networks, previ¬ 
ous structure-based Sybil defenses have limited applica¬ 
bility and performance. 

In this paper, we focus on the problem of mitigating 
Sybil attacks in social networks with weak trust, i.e., 
when the number of attack edges is large. We propose 
SybilFrame, an approach that provides defense-in-depth 
against Sybil attacks. SybilFrame uses a multi-stage 
classification mechanism that is able to incorporate het¬ 
erogeneous sources and types of information about the 
social network. In the first stage, SybilFrame leverages 
fine grained local information about users and edges in 
the social network to design classifiers for predicting 
whether users or edges are benign or malicious. In the 
second stage, SybilFrame combines information from 
local classifiers with global structural properties of so¬ 
cial networks (even ones with weak trust properties). 
Our approach leverages the results of local classification 
about users and edges as prior probabilities in a pairwise 
Markov Random Field model d, and uses Loopy Be¬ 
lief Propagation d to make probabilistic inferences. 
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We experimentally evaluate the performance of Sybil- 
Frame using both synthetic and Facebook network 
topologies. We show that local node classifiers that are 
better than random (e.g., false positive/negative rates as 
high as 40%), can significantly improve the Sybil detec¬ 
tion accuracy when combined with global structural in¬ 
formation. Similarly, local edge classifiers with even a 
small predictive capability, provide synergistic informa¬ 
tion to global structural inference, and improve detection 
accuracy. Our approach is resilient to seed targeting at¬ 
tacks and a high number of attack edges which are com¬ 
mon in social networks with weak trust. 

We test SybilFrame on a large scale Twitter dataset 
with over 20M nodes and 265M edges. We obtain in¬ 
formation about which accounts in this dataset were sus¬ 
pended by Twitter, and use this as ground truth for Sybil 
attacks. This dataset is typical of social networks with 
weak social trust, as the attacker has more than 18M at¬ 
tack edges for about 145,000 Sybil identities. Even in 
this challenging setting with very large number of attack 
edges, SybilFrame is able to detect 51% Sybil identities 
with 4.2% false positives, with an overall accuracy of 
95.4%. In contrast, state-of-the-art approaches such as 
SybilBelief predict all nodes to be Sybil and thus com¬ 
pletely fail on this dataset. SybilFrame can also be used 
as a mechanism to rank user accounts. In the top IK ac¬ 
counts ranked by SybilFrame (in increasing order of be¬ 
ing benign), SybilFrame identifies 55% Sybil accounts, 
which is 1-2 orders of magnitude better than state-of-the- 
art approaches. Furthermore, we manually examine the 
profile of the top 100 ranked users, of which 71 are sus¬ 
pended and 29 are active, and find that 24 active accounts 
are highly likely to be malicious. Thus, SybilFrame is 
able to uncover a large fraction (24/29) of suspicious ac¬ 
counts that Twitter fails to detect. 

2 Background 

First, we give a formal definition of the Sybil defense 
problem in online social systems, and discuss state-of- 
the-art approaches. Then, we introduce our design goals. 

2.1 Sybil Defense in Online Social Systems 

Consider a network topology G = {V^E), comprising a 
set V of nodes with a set E of edges. In social network 
topologies, a node v G V denotes a user on the network, 
and an edge (i/,v) G E denotes a friendship relationship 
between two users u and v. Here we only consider mutual 
relationships, hence (w,v) G £" is equivalent to (v, w) G E 
and G is an undirected graph. Every node v G V in the 
network is either a benign node, or a Sybil identity. 

Figure [^depicts the Sybil attack problem. We denote 
the subnetwork containing all benign nodes to be the be¬ 
nign region, and denote the subnetwork containing all 
Sybil nodes to be the Sybil region. The edges that con¬ 
nect the benign region and the Sybil region are called at¬ 


tack edges. Following the established convention in the 
literature, we do not impose any constraints on the size 
or the shape of the Sybil region. Attackers can create an 
unlimited number of Sybil nodes and set up edges be¬ 
tween them arbitrarily. The main goal of Sybil defense is 
to design a mechanism to detect as many Sybil nodes as 
possible, while minimizing the number of benign nodes 
that are misdetected, i.e., a low false positive rate. 


Benign Region Sybil Region 



Figure 1: Sybil attack problem. 

2.2 State-of-the-art Approaches 

Content-based approaches: Content-based approaches 
seek to filter Sybil accounts by analyzing the associ¬ 
ated content information, such as news feeds and wall 
posts on Facebook and tweets and hashtags on Twit¬ 
ter ii. These approaches span a large category of mech¬ 
anisms, including blacklisting, whitelisting, URL filter¬ 
ing, as well as various machine learning methods, such as 
Bayesian Reasoning, Support Vector Machines and Clus¬ 
tering ca ca. A major problem of these approaches is 
that attackers can mimic the behaviors of benign users 
and produce similar content information, thus making 
content-based approaches less effective. 

Structure-based approaches: Structure-based ap¬ 

proaches, seek to exploit graph-theoretic differences be¬ 
tween benign and Sybil identities. The key insight is that 
in a social graph where edges represent strong trust re¬ 
lationships between users, it is hard for attackers to set 
up links to benign users. As a result, the number of at¬ 
tack edges is relatively small. Such networks preserve 
a strong level of homophily, i.e., two linked nodes are 
likely to have similar attributes. 

SybilGuard ^ and SybilLimit i), rely on the insight 
that it is easy for short random walks starting from a 
benign user to quickly reach other benign users, while 
hard for random walks starting from Sybil identities to 
enter into the benign region. Sybilinfer (71, relies on 
random walks and a combination of Bayesian inference 
and Monte-Carlo sampling and aims to directly detect 
the bottleneck cut between benign and Sybil identities. 
SybilRank O, uses short random walks to distribute 
initial scores from a set of trusted benign seeds, and 
rely on the insight that benign users tend to have larger 
degree-normalized scores than Sybil identities. Crimi¬ 
nal account Inference Algorithm (CIA) ca, similar to 
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SybilRank, starts random walks and distributes scores 
from Sybil seeded users and allows the restart from ini¬ 
tial probability distribution with certain probability. Re¬ 
searchers have shown that despite considerable differ¬ 
ences, the above schemes rely on identifying local com¬ 
munities around a trust node IITtI . SybilBelief O, on the 
other hand, models the distribution of labels of the nodes 
as a pairwise Markov Random Field. Similar to Sybil- 
Frame, it adopts Loopy Belief Propagation to estimate 
probabilities of users being benign. Integro (TSl, is an ex¬ 
tension to SybilRank by incorporating victim predictions 
using content features, thus not purely structure-based. 

We note that all of the above-mentioned structure- 
based methods are based on two key assumptions. First, 
the benign region is fast mixing iEi, which presumes the 
existence of a well-connected, giant community struc¬ 
ture of benign users. Second, the social network is a 
strong trust network, where the number of attack edges is 
relatively small 03. Given the two assumptions, these 
structure-based approaches have been shown to provide 
reliable performance. 

2.3 Assumptions vs. Reality 

We claim that the above mentioned assumptions over¬ 
simplify social network structure, and do not hold well 
on all real-world social graphs. 

First, benign users tend to form multiple small com¬ 
munities 113 driven by different purposes (e.g., geo¬ 
graphical location, education and career). This multi¬ 
community structure prohibits the existence of a giant 
community component and hence results in a longer mix¬ 
ing time. Mohaisen et al. C3 measured the mixing time 
of real-world social graphs and found that the actual mix¬ 
ing time is longer than the theoretical anticipated value. 

Second, real-world social networks may not necessar¬ 
ily represent strong trust networks. Yang et al. showed 
that RenRen, the largest social networking platform in 
China, does not follow this assumption Go). Another 
typical example is the Twitter network. The Twitter net¬ 
work is a directed network, on which links are estab¬ 
lished by the action of “follow”. Unlike Facebook, users 
in Twitter often use a pseudonym, which makes them 
less serious about whom they choose to follow. Ghosh et 
al. C3 showed that on Twitter, the notable phenomenon 
of link farming is wide spread, and that a majority of 
attack edges are farmed from a small fraction of Twit¬ 
ter users. Those users, the social capitalists, are benign 
users who are seeking to increase their social power and 
links by following back anyone who follows them. Even 
normal users, who are not as athirst for social power as 
social capitalists, are also likely to follow back strangers 
because they want to read their tweets or just by cour¬ 
tesy. On such weak trust social networks like Twitter, a 
large number of attack edges exist and the benign region 


may not be easily separable from the Sybil region. As a 
result, all of these structure-based Sybil defense mecha¬ 
nisms are limited in their performance. 

2.4 Design Goals 

We aim to design a scheme that works even when the 
fast-mixing and strong trust assumptions are relaxed. 
Our overall design goals are as follows: 

1) Defense-in-depth: The scheme should provide 
multi-layered protection, and be robust to different attack 
strategies. 

2) Accuracy: The scheme should have reliable detec¬ 
tion accuracy when applied to a wide range of social net¬ 
work topologies, including both strong trust and weak 
trust social networks. 

3) Scalability: The scheme should be scalable to large 
social networks, and be amenable to parallel deployment. 

We propose SybilFrame, a defense-in-depth frame¬ 
work that adopts a multi-stage classification mechanism 
for incorporating heterogeneous sources and types of in¬ 
formation about the social network. 

3 The SybilFrame Framework 

In this section, we give a detailed description of Sybil¬ 
Frame framework. 

3.1 Framework Overview 



Figure 2: SybilFrame framework. 

Figureshows the general framework of SybilFrame. 
SybilFrame is a multi-stage classification approach that 
leverages the attributes of an individual node and correla¬ 
tion between connected nodes to make a combined clas¬ 
sification of networked data. SybilFrame has two stages 
of inference. Once the raw data has been fed into the 
framework. Stage 1 will explore the dataset and extract 
useful information, to compute node prior information 
and edge prior information (Section [T2| ). This prior in¬ 
formation, together with a small set of nodes whose la¬ 
bels are known, i.e., trust seeds, will be fed into Stage 2. 
Stage 2 is the posterior inference layer. To represent the 
correlation between nodes, we model the problem as a 
pairwise Markov Random Field (Section [T^ . We adopt 
Loopy Belief Propagation (Section |3.4| ) to make infer¬ 
ences about the posterior information. This posterior in¬ 
formation will then be used to classify and rank Sybil 
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identities (Section |T5] ). 

3.2 Prior Information 

Stage 1 in Figure takes the raw dataset as input, and 
outputs the prior information of all nodes and edges. We 
now formalize our notion of priors. 

For a node v G V, we denote Priovy as the node prior 
of V. Priovy is a real number in the range [0,1], that quan¬ 
tifies the probability that node v takes a benign label. The 
larger Priovy is, the more likely that v is a benign node. 
Specifically, Priovy >0.5 means that v is more likely to 
take a benign label rather than a Sybil label. Similarly, 
PrioVy <0.5 means that v is more likely to take a Sybil 
label, and PrioVy = 0.5 means that v takes a benign or 
Sybil label with equal probability. If v’s label is known, 
then PrioVy = 1 for a benign trust seed, and PrioVy = 0 
for a Sybil trust seed. 

For two nodes u and v that are connected by an 
edge, we denote Priovu^y as the edge prior of (m,v) G E. 
Prior u^v is a real number in the range [0,1] , that quantifies 
the likelihood that node u and node v take the same label. 
The larger Prioru^v is, the more likely that u and v take 
the same label. Specifically, Prior^^y > 0.5 means that u 
and V are more likely to take the same label than different 
labels. Similarly, Prior^^y < 0.5 means that u and v are 
more likely to take different labels, and Prior^^y = 0.5 
means that u's label has no infiuence on v’s label, and 
vice versa. Generally, Prior^^y models the level of cou¬ 
pling strength between u and v. Prioru ,v > 0.5 refers to a 
positive coupling relationship, and Prioru ,v < 0.5 refers 
to a negative coupling relationship, and Prior u^y = 0.5 
means that there is no coupling between u and v. 

The derivation of node priors and edge priors is based 
on the dataset we are given. We can leverage hetero¬ 
geneous information sources to make inferences. To 
compute node priors, we can leverage the structural in¬ 
formation and explore differences of local structure be¬ 
tween benign and Sybil nodes. We can extract useful fea¬ 
tures and build a machine learning classifier that supports 
probability estimates, and use these probability outputs 
as node priors. To infer edge priors, we want to assign 
lower scores to attack edges, and assign higher scores to 
edges between benign accounts. We do not care about 
edges between Sybil accounts, since attacker has a com¬ 
plete control over the Sybil region and can change it ar¬ 
bitrarily. This makes our approach robust to high num¬ 
ber of attack edges and distinguishes SybilFrame from 
previous approaches. Since benign nodes tend to behave 
similarly and Sybil nodes tend to behave differently from 
benign nodes, a straightforward way is to explore simi¬ 
larities of two connected nodes under different metrics 
and obtain a scaled overall similarity score. This overall 
score can then be used as an edge prior. 

We note that although we propose a structure-based 


scheme, as will be demonstrated and evaluated later, our 
framework can definitely incorporate content informa¬ 
tion. For example, we can analyze news feeds of each 
Facebook account and tweets of each Twitter account, 
and identify spam keywords and abnormal actions. We 
can then build a content-based classifier and compute 
node priors. The philosophy also works for content- 
based edge priors. The reason why we tend to use struc¬ 
tural information is that it is harder for an attacker to alter 
the overall graph structure than mimic the content behav¬ 
iors of benign users. In Sectionand Sectionwe will 
explore ways to compute node priors and edge priors on 
real-world, large-scale social graphs. 


3.3 Markov Random Field 


A Markov Random Field (MRF) ifT^ . is a probabilis¬ 
tic graphical model over an undirected graph. Nodes in 
MRF are random variables, and edges are used to model 
correlation between those random variables. For each 
node V G y on graph G = iV^E), we associate it with a 
binary random variable Xy, that indicates the label of v. 
Xy = \ refers to a benign label, and Xy = — \ refers to a 
Sybil label. To quantify the correlation, we use a set of 
functions called clique potentials. A clique potential is 
a function defined over a set of random variables, which 
maps any joint assignment of these random variables to 
a real number, which indicates how favorable this joint 
assignment is. Let 'F denote the set of potential func¬ 
tions. Specifically, if we only consider cliques compris¬ 
ing at most two connected nodes, 'F can be divided into 
the following two types of functions. 


Wv{^v) 


Priory^ '\fXy = l 

1 — Priory , iiXy = —\ 


( 1 ) 


Wu.vi^Ul^v) 


Prior u^y-j \fXuXy — 1 

1 - Prioru^y , if XuXy = -1 


( 2 ) 


As defined in Section 3.2 Priory is the prior informa¬ 
tion of node v, and Prioru^y is the prior information of 
edge (t/,v). We denote function if/y as the node poten¬ 
tial, and function Xf/u^y as the edge potential. (G,^^) then 
defines a pairwise Markov Random Eield. 


Given a pairwise MRF (G,^^), where G = iV^E) and 
'F = 1//^^ v), the full joint probability distribution is 

specified as 

P{Xv) = \Y{^v{Xv) n ^uAXu,X,) (3) 

^ vev (u,v)^E 


Here, Xy denotes a particular joint assignment of all 
random variables in set V, and Z is the partition function 
given by 

^=En V^v(2fv) V^m,v(2G,W) (4) 

Xy veV (u,v)^E 
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3.4 Infer Posteriors 

Given the pairwise MRF (G, 'F), which contains prior in¬ 
formation of trust seeds and other nodes and edges, for 
each node v G V, we want to infer the posterior probabil¬ 
ity of random variable X^. 

P{Xv) = ^'ZY['l'siXs) n ¥uAXu,Xs) (5) 

Xy\^^S^V {u,s)^E 

Exact inference is computationally difficult, and not 
scalable on large dataset. Therefore, we adopt Loopy Be¬ 
lief Propagation to make approximate inferences. Loopy 
Belief Propagation ca is an iterative process in which 
neighboring variables pass messages or beliefs to each 
other. Algorithm gives the Loopy Belief Propagation 
algorithm for the pairwise MRF (G,^^). 


Algorithm 1: Loopy Belief Propagation Algorithm 
Data: node potentials vAv(^v), edge potentials 

Result: marginal beliefs bely{Xy) 

Initialize beliefs hely{Xy) = 1 for all nodes v 
Initialize message niu^vi^v) = 1 for all edges uv 

repeat 

Messages update niu^vi^v) = 

Lx, {¥uiXu)WuAXu,Xv)Yl seNbd {u) \v {^s ) ) 

Beliefs update 

bely(Xy) OC Vrv(^v) nMeA/^Z7j(v) ^W-^v(Av) 
until number of iterations > threshold d 


We note that for social networks with loops, LBP ap¬ 
proximates the posterior probability distribution without 
theoretical convergence guarantees. However, LBP has 
been widely used and demonstrated good results in prac¬ 
tical applications ca. Through our experiments, we find 
that setting d to be 5^6 achieves good results. 

Scalability: The complexity of LBP is 0{md) where m 
is the number of edges and d is the number of iterations. 
For sparse social networks, 0{md) = 0{nd), where n is 
the number of nodes. LBP is essentially parallelizable, 
and we will discuss related implementation issues in Sec¬ 
tion |631 


3.5 Sybil Accounts Prediction and Rank¬ 
ing 

We use posteriors obtained in Section 34 to predict the 
label of each node. For a node v whose label is unknown, 
we predict the label Ly using the following rule. 

Ly = sign(Z?^/v — 0.5) (6) 

where Ly = \ means that v is predicted as a benign node, 
and Lv = — 1 means that v is predicted as a Sybil node. 

We can also rank nodes in ascending order of its pos¬ 
terior, and produce a ranking list. Sybil nodes are likely 
to have lower posteriors, thus occur more in the front 


part. OSN operators can then go through the list from 
the beginning, and check a fixed number of nodes. More 
effective posteriors will let OSN operators detect more 
Sybil accounts within a certain amount of time. 

4 Security Evaluation on Synthetic Net¬ 
works 

In this section, we evaluate SybilFrame on different net¬ 
work structures. For comparison, we use SybilBelief ||3|, 
which takes a similar probabilistic inference approach as 
SybilFrame. Since Gong et al. m have demonstrated 
that SybilBelief outperforms other structure-based meth¬ 
ods on trust networks, we limit our space here to only 
compare with SybilBelief. Later in Section and Sec¬ 
tion we will compare with other methods such as 
SybilLimit (61, Sybilinfer @, and SybilRank (H. We do 
not compare with Integro CD since it leverages network- 
specific content information for victim predictions. 

Basic experimental setup: We adopt the Preferential 
Attachment (PA) (^ model to generate both benign re¬ 
gion and Sybil region. The size of benign region is 1000, 
and the size of Sybil region is 400. The average degree of 
both benign region and Sybil region is 10. We randomly 
add 1000 attack edges between the two regions. We only 
use 1 benign trust seed and 1 Sybil trust seed. For default 
node priors, we set 0.9 for benign trust seeds, and 0.1 for 
Sybil trust seeds, and 0.5 for others if we do not have any 
external priors fed in. For default edge priors, we set it to 
0.9 in order to model homophily. We will study the im¬ 
pact of different factors. When we study one factor, we 
fix the other factors to be the same as in the basic setup, 
and only vary the studied one. Under each setting, we 
run 100 experiments. In each experiment, we randomly 
generate 1 benign trust seed and 1 Sybil trust seed, con¬ 
figure prior information, and run SybilFrame and Sybil¬ 
Belief. We store results of SybilFrame and SybilBelief 
correspondingly, and take the average over 100 experi¬ 
ments in the end to be our final results. 

Evaluation metrics: Following the convention, we de¬ 
note Sybil nodes as positive examples and benign nodes 
as negative examples. Thus, we have TP (Sybil ^ Sybil), 
TN (benign ^ benign), FP (benign ^ Sybil) and FN 
(Sybil ^ benign). We use the following four evaluation 
metrics: 

1) Accuracy: {TP^TN)/{TP^TN-i-FP^FN) 

2) Number of rejected benign nodes: FP 

3) Number of accepted Sybil nodes: FN 

4) Area Under the Receiver Operating Character¬ 
istic Curve (AUC) (HI : The probability that a randomly 
selected benign nodes ranks higher than a randomly se¬ 
lected Sybil node, given the ranking of posteriors of all 
nodes from the smallest to the largest. 
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(a) Accuracy (b) AUC 



(c) Rejected benign nodes (d) Accepted Sybil nodes 


Figure 3: Vary FPR=FNR (node prior) 

4.1 Influence of Node Priors 

We want to explore SybilFrame when only incorporat¬ 
ing external node priors. Since we are experimenting 
with synthetic networks, we need to figure out a way to 
obtain node priors that are able to model the real case. 
A straightforward solution is to use false positive rate 
(FPR) and false negative rate (FNR) to model the perfor¬ 
mance of an external node classifier. By setting up differ¬ 
ent FPR and FNR, we can generate prior scores that are 
able to model the level of noise, and use them to evaluate 
SybilFrame. Due to limited space, we list our Node Prior 
Generator algorithm (Algorithmin Appendix |9.1[ In 
Algorithmic we set prior for benign/Sybil trust seeds to 
be 0.9/0.1, not 1/0 as discussed in Section [T^ in order to 
run LBP successfully. 

Varying FPR and FNR: First, we evaluate SybilFrame 
given node priors with different levels of noise. We tune 
FPR = FNR from 0 to 0.5, i.e., from perfect classifica¬ 
tion to random guess, and fix everything else as in the 
basic setup. In addition to comparison with SybilBelief, 
we also compare with the performance of external node 
classifier, i.e., compare with priors, and explore whether 
there are improvements. Figure shows the results. As 
FPR = FNR increases, the performance of external node 
classifier degrades linearly. Besides, SybilFrame per¬ 
forms better than SybilBelief when FPR = FNR < 0.4, 
in terms of all four metrics. When FPR = FNR < 0.3, 
SybilFrame can achieve near optimal performance. This 
means that SybilFrame is resilient to prior noise with 
FPR and FNR as high as 40%. 

Varying the number of attack edges: Second, we 
evaluate SybilFrame when the number of attack edges 
changes. We set FPR and FNR to be 0.3, and vary the 
number of attack edges from 0 to 1000. Figure shows 
the results. We find that both SybilFrame and SybilBelief 



Number of attack edges Number of attack edges 


(a) Accuracy (b) AUC 



(c) Rejected benign nodes (d) Accepted Sybil nodes 


Figure 4: Vary the number of attack edges (node prior) 
have good performance with less than 200 attack edges. 
However, when the number of attack edges increases, 
SybilBelief degrades its performance while SybilFrame 
still has stable and near optimal detection accuracy. 

Varying the size of the Sybil region: Furthermore, we 
evaluate SybilFrame when attacker changes the size of 
Sybil region. We will not consider the case when the 
Sybil region is too small, since it has limited utility to 
perform large-scale attacks. We set FPR and FNR to 
be 0.3, and vary the size of Sybil region from 400 to 
1000. Figure shows the results. When there are more 
Sybil nodes, both SybilFrame and SybilBelief improve 
performance. This is because when both the benign and 
Sybil region are large, the internal homophily is strong 
enough to overcome the infiuence of attack edges. How¬ 
ever, SybilFrame still performs better than SybilBelief. 
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Figure 5: Vary the size of Sybil region (node prior) 
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(c) Rejected benign nodes (d) Accepted Sybil nodes 


Figure 6: Vary FPR=FNR (edge prior) 


Figure 7: Vary the number of attack edges (edge prior) 


4.2 Influence of Edge Priors 

We want to explore SybilFrame when only incorporat¬ 
ing external edge priors. Similarly, we use FPR and 
FNR to model the performance of an external edge clas¬ 
sifier, which makes predictions of attack edges and other 
edges. We list our Edge Prior Generator algorithm (Al¬ 
gorithmic in Appendix |9.1 1 

Varying FPR and FNR: First, we tune FPR = FNR 
for edge priors from 0 to 0.5. We run SybilFrame with 
default node priors and compare with SybilBelief. From 
Figure IC with FPR = FNR <0.3, SybilFrame performs 
better than SybilBelief. Figure[^in Appendix |9.2| shows 
the results when we set FPR = 0.1 and tune FNR from 
0 to 0.5. As we can see, SybilFrame has good perfor¬ 
mance and outperforms SybilBelief even when FNR is 
0.5. This means that as long as the external edge classi¬ 
fier has some power to detect attack edges, incorporating 
edge priors into SybilFrame gives better performance. 

Varying the number of attack edges: Second, we set 
FPR to be 0.1 and FNR to be 0.5, and vary the num¬ 
ber of attack edges from 0 to 1000. Figure |C shows the 
results. As the number of attack edges increases, both 
SybilFrame and SybilBelief degrade performance. How¬ 
ever, SybilFrame still outperforms SybilBelief. Notice 
that the performance of SybilFrame depends the detec¬ 
tion accuracy of external classifier. If we have a clas¬ 
sifier with 0.1 FPR and 0.1 FNR, SybilFrame will have 
near optimal performance. 

Varying the size of the Sybil region: Furthermore, 
we evaluate SybilFrame with edge priors when attacker 
changes the size of the Sybil region. We set FPR to be 
0.1 and FNR to be 0.5, and vary the size of Sybil region 
from 400 to 1000. From Figure SybilFrame improves 
its performance when there are more Sybil nodes, and 
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Figure 8: Vary the size of Sybil region (edge prior) 
still outperforms SybilBelief. 

4.3 Resilient Against Seed Targeting At¬ 
tacks 

We are interested in the impact of seed targeting attacks, 
i.e., when the known labeled nodes are end points of at¬ 
tack edges. We consider the following cases: 

1) SI: Benign (Sybil) trust seeds are not end points of 
attack edges. 

2) SII: Benign (Sybil) trust seeds are end points of 
attack edges. 

Figure shows the accuracy as a function of the 
number of attack edges for four scenario combina¬ 
tions of trust seeds, in the node prior experiment 
(FPR=FNR=0.3) and edge prior experiment (FPR=0.1, 
FNR=0.5). We find that the location of trust seeds have 
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Figure 9: Accuracy of SybilFrame under seed targeting at¬ 
tacks. (a) Given node priors, (b) Given edge priors. 

no influence on the detection accuracy. Due to limited 
space, we list the results of AUC, FP and FN in Ap¬ 
pendix |9.3| We And that SybilFrame is resilient against 
seed targeting attacks, and we can simply select trust 
seeds uniformly at random. 

4.4 Summary 

In summary, we have the following observations: 

1) SybilFrame outputs near optimal results when in¬ 
corporating node priors with FPR and FNR less than 0.3. 

2) SybilFrame outperforms SybilBelief when incorpo¬ 
rating node priors with FPR and FNR less than 0.4. 

3) When incorporating edge priors, as long as the edge 
priors has a low FPR (0.1) and some level of FNR (less 
than 0.5), SybilFrame outperforms SybilBelief. 

4) SybilFrame is robust to different attack strategies 
and resilient against seed targeting attacks. 

5 Evaluation on Facebook Network 

We evaluate SybilFrame on semi-real Facebook network, 
and compare with state-of-the-art Sybil defense mecha¬ 
nisms: SybilLimit, Sybilinfer, SybilRank and SybilBe¬ 
lief. We And that SybilFrame performs orders of mag¬ 
nitudes better than other methods, especially when the 
number of attack edges is large. Furthermore, the perfor¬ 
mance of SybilFrame is stable and near optimal. 

5.1 Dataset Description 

The dataset we use is the ego-Facebook dataset obtained 
from Stanford Network Analysis Project (SNAP) ||22l. 
The Facebook graph contains 4,039 nodes and 88,234 
edges. In this graph, nodes are Facebook accounts and 
edges are friendship relationships. The graph is con¬ 
nected and undirected, with a diameter 8 and average 
clustering coefficient 0.6055. 

5.2 Experimental Setup 

We construct the network topology as follows. We use 
this Facebook dataset as both the benign region and Sybil 
region, and randomly add attack edges between the two 
regions. We vary the number of attack edges from 1000 
to 20000, and evaluate the performance of SybilFrame, 
as well as SybilLimit, Sybilinfer, SybilRank and Sybil¬ 
Belief. We randomly select 1 benign trust seed and 1 


Sybil trust seed, and perform the experiments 100 times 
and then take the average. 

5.3 Compute Prior Information 

To run SybilFrame, we need prior information. Since the 
benign region is identical to the Sybil region, we are not 
able to collect distinguishable node priors. Thus, we only 
explore ways to compute edge priors. 

As discussed in SectionjT^ we can leverage similarity 
between two connected nodes, and use it as a prior for the 
edge between them. Intuitively, connected benign nodes 
are similar and connected benign and Sybil nodes are 
not similar. Therefore, attack edges should have a lower 
score than non-attack edges. We adopt the Jaccard index 
here as a measure of similarity. For an edge (w,v) G E, 
the Jaccard index |[23l of it is defined as ^ where 

r{u) denotes the set of one-hop neighbors of node u, and 
|r(t/) nr(v)| denotes the number of common neighbors 
of u and v. For edges that connect two trust seeds, we set 
the prior of it to 0.1 if the edge is an attack edge, and set 
to 0.9 if the edge is a non-attack edge. For other edges, 
we compute the corresponding Jaccard index. We scale 
these indices into the range [0.1,0.9]. These scaled Jac¬ 
card scores will then be used as priors. 

We can also use other similarity metrics, such as Co¬ 
sine index 1^ or Adamic-Adar index 1^ , and combine 
them to obtain an overall similarity score. A possible ap¬ 
proach is to use these raw similarity scores as features 
for an edge, and obtain a feature matrix for all edges 
on the graph. We can then adopt a supervised learning 
approach by leveraging existing tools, such as Logistic 
Regression 1251 and Support Vector Machine 1 ^ . and 
make probabilistic predictions of each edge being a non¬ 
attack/attack edge. These probabilistic outputs can then 
be used as overall prior scores. 

5.4 Results 

Figure shows the performance of SybilFrame and 
other Sybil defense mechanisms as we vary the number 
of attack edges from 1000 to 20000. Since SybilRank 
is a ranking scheme and it is very hard to directly use 
the degree normalized scores to make predictions, we 
only compare with SybilRank in terms of AUC. Also, 
since SybilLimit and Sybilinfer output binary predictions 
rather than belief scores, we do not include them into 
AUC comparison. We find that: 1) As the number of at¬ 
tack edges increases, the performance of precious meth¬ 
ods degrades, with a lower accuracy (SybilLimit, Sybil¬ 
infer, SybilBelief), and lower AUC (SybilRank). 2) The 
speed of performance degradation is fast. With more than 
3000 attack edges, the detection accuracy of SybilBelief 
is less than 0.5, worse than a random guess, and Sybil¬ 
Limit and Sybilinfer predict all Sybil nodes to be benign, 
thus losing the detection capability. Thus, SybilBelief, 
SybilLimit and Sybilinfer do not work on weak trust net- 
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Figure 10: Performance comparison on Facebook network 

works with a large number of attack edges. 3) The per¬ 
formance of SybilFrame is stable and near optimal in all 
cases. By incorporating edge prior information, Sybil- 
Frame is able to restrict the amount of message passing 
across the attack edges. Thus, SybilFrame is able to suc¬ 
cessfully handle the situation when the number of attack 
edges is large, and performs orders of magnitudes better 
than other approaches. 

6 Evaluation on Real-World Large-Scale 
Twitter Network 

In this section, we will evaluate SybilFrame on real- 
world large-scale Twitter network comprising over 20M 
nodes and 256M edges. We will explore ways to com¬ 
pute prior information, and incorporate it to SybilFrame. 

6.1 Collecting Twitter Dataset 

We obtained a snapshot of the Twitter follower network 
which was crawled by Kwak et al. 1^ . 

Pre-processing: Originally, the Twitter network is di¬ 
rected. Since it is easy for attackers to manipulate one¬ 
way directed edges, we transform this directed network 
to an undirected one by retaining an undirected edge 
between u and v if both directed edges and (v,t/) 
exist. Furthermore, we select the largest connected com¬ 
ponent of the transformed network since all investigated 
algorithms require the networks to be connected. The 
largest connected component contains 21,297,772 nodes, 
and 265,025,545 edges, with average degree 24.9. 

We note that some previous works remove nodes with 
degrees smaller than a threshold from the social net¬ 
works. For instance, SybilLimit |[6l removes nodes 
with degree smaller than 5 and Sybilinfer [Tl removes 
nodes with degree smaller than 3. Mohaisen et al. 113 
found that such pre-processing will prune a large por¬ 


tion of nodes. Indeed, social networks often have a 
long-tail degree distribution (e.g., power-law degree dis¬ 
tribution EU and lognormal degree distribution 1281 ). 
in which most nodes have very small degrees. Thus, a 
large portion of nodes are pruned by such pre-processing. 
Such pre-processing could result in high FPR or high 
FNR depending on how the OSN operator treats the 
pruned nodes. If the OSN operator treats all the pruned 
nodes whose degrees are smaller than a threshold as be¬ 
nign nodes, then an attacker can create many malicious 
nodes with degree smaller than the threshold, resulting 
in high FNR, otherwise a large fraction of benign nodes 
will be treated as malicious nodes, resulting in high FPR. 
Therefore, we do not perform such pre-processing to the 
Twitter network. 

Collecting ground truth: To evaluate the approaches, 
we need ground truth for the nodes in the Twitter net¬ 
work. The collected Twitter network includes users’ 
Twitter IDs. Therefore, we re-crawled every account 
using Twitter’s API, which tells us the status (i.e., ac¬ 
tive, suspended or deleted) of each account. In summary, 
we found that 145,156 nodes (i.e., 0.7%) are suspended, 
1,911,482 nodes (i.e., 9.0%) are deleted, and the rest of 
the nodes are still active. We take the suspended accounts 
as Sybil nodes and the active ones as benign nodes. 

6.2 Measuring Twitter Structure 

We find that: 1) Many Sybil nodes are isolated from other 
Sybils. 2) The number of attack edges is very large. This 
means that using existing structure-based Sybil detection 
approaches will achieve limited performance. 

No community structure: We adopt modularity 1^ . 
ranging from -0.5 to 1, to quantify if a partition of a net¬ 
work (i.e., the partition in our case consists of the benign 
and Sybil regions in the Twitter network) can be viewed 
as two communities. Clauset et al. 1301 concluded, via a 
large amount of empirical experiments on real networks, 
modularity >0.3 indicates significant community struc¬ 
ture. However, we find that the partition consisting of 
the benign and malicious regions only has modularity 
0.0042. Thus, the benign and Sybil regions cannot be 
viewed as two separate communities. Next, we show two 
reasons: half of the Sybil nodes are isolated and the num¬ 
ber of attack edges per Sybil node is high. 

1) Half of the Sybil accounts are isolated: In to¬ 
tal, we find 77,917 connected components in the Sybil 
region (i.e., the subgraph including all malicious nodes 
and edges between them). Figure pT] shows the distribu¬ 
tion of sizes of these components. First, around 50% of 
Sybil nodes are isolated, i.e., they only link to benign 
nodes. Second, we find that there exists a large con¬ 
nected component including 45% of all malicious nodes. 
Specifically, this component consists of 65,579 nodes 
and 931,287 edges, resulting in an average degree of 
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Figure 11: Distribution of connected Sybil component sizes 




(a) Benign nodes (b) Sybil nodes 

Figure 12: CDFs for a) benign and b) Sybil nodes 

28.40. Thus, the large component is even denser than the 
benign region whose average degree is 21.62. We might 
wonder if this large connected component can be viewed 
as a community. However, we find that the modularity 
of the partition consisting of the benign region and the 
largest connected component is still only 0.0046, which 
means that even this large connected component cannot 
be viewed as a community. Third, the rest of nodes are 
in connected components whose sizes are less than 20. 

2) Large number of attack edges: We observe that 
there are 18,414,469 attack edges, which means each 
Sybil node successfully attacks around 127 benign nodes 
on average. Figure [T^ further characterizes how attack 
edges are distributed among the benign and malicious re¬ 
gions. We can draw several conclusions from this figure. 

First, the benign and Sybil regions are structurally 
similar. Specifically, the number of all neighbors of 
both benign and Sybil nodes follow long-tail distribu¬ 
tions. In fact, such long-tail distributions are also widely 
observed in other OSNs such as LiveJoumal ED and 
Google+ 1^ . We speculate that Sybil nodes are imi¬ 
tating the benign region to evade automatic detection. 

Second, from Figure [T^ we find that around 90% of 
benign nodes are not connected to any Sybil node. More¬ 
over, the number of Sybil neighbors of benign nodes also 
follows a long-tail distribution. This implies that, al¬ 
though around 10% of benign nodes link to malicious 
nodes, most attack edges concentrate on a smaller num¬ 
ber of benign nodes. For instance, we find that 90% of 
attack edges concentrate on only 3% of benign nodes. 


Benign region (18.5M) Sybil region (0.15M) 



Highlights 

1) 90% benign nodes have no attack edges. 

2) 3% benign nodes have 90% attack edges. 

3) 50% Sybil nodes are isolated. 

4) 2% Sybil nodes have no attack edges 

5) 16% Sybil nodes have 90% attack edges 


Figure 13: Structure of Twitter network 


We speculate that such nodes are celebrities that tend to 
follow back to any user who follows them. 

Third, from Figure 12b we observe that around 2% 
of Sybil nodes do not link to any benign node. Again, 
the number of benign neighbors of Sybil nodes follows 
a long-tail distribution, which implies that most attack 
edges are produced by a small portion of Sybil nodes. 
For instance, we find that 90% of attack edges are pro¬ 
duced by only 16% of Sybil nodes. 

Note that the structural properties (i.e., many Sybil 
nodes are isolated and there are a large number of attack 
edges per Sybil node) of the Sybil nodes in our Twitter 
dataset match those in another large-scale Twitter net¬ 
work CD and those in the RenRen social network m, 
which indicates the representativeness of our observa¬ 
tions. Figure p3] gives a snapshot of the structure. 


Summary: We observe that the reason why structure- 
based Sybil detection approaches fail is that the assump¬ 
tions they require are not satisfied. Specifically, the be¬ 
nign and Sybil regions cannot be viewed as two separate 
communities. One reason is that a significant portion of 
the Sybil nodes are isolated, and the other reason is that 
the number of attack edges per Sybil node is high. 


6.3 Computing Node Priors 

We now discuss ways to compute node priors. The idea is 
to collect features and train a classifier that outputs prob¬ 
abilistic scores. Since we do not know whether deleted 
accounts are benign or Sybil, we will not include them in 
the training, prediction and evaluation process. We just 
set the priors for them to be 0.5. 

6.3.1 Collecting node features 

We compute the following three features. We compute 
Feature 1) and 2) for all nodes on the original directed 
network, and map to the corresponding nodes on the 
undirected largest connected component. We directly 
compute Feature 3) on the undirected topology. 

1) Incoming requests accepted ratio: The insight is 
that a Sybil identity is more likely to accept incoming re¬ 
quests than benign users, in order to quickly propagate 
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Figure 14: Analysis of the node features. 


spam. Hence on average, Sybil identities shall have a 
higher incoming requests accepted ratio. Since we only 
have structural information, we decide to use the incom¬ 
ing and outgoing edges associated with a node to model 
the ratio. For a node v on the original directed Twitter 
graph, we denote (v) as the set of all incoming edges 
of V, and denote Tout (v) as the set of all outgoing edges 
of V. Hence r/„(v) DTouti^) is the set of edges that are 
both incoming and outgoing edges of v. The incoming 
requests accepted ratio is modeled as 


Reqin 


|nn(v)nro»t(v)| 

|r,'„(v)| 


(7) 


where |r(v)| denotes the cardinality of the set r(v). 

2) Outgoing requests accepted ratio: The insight is 
that a benign user is more reliable and hence the out¬ 
going friend requests send from him/her are more likely 
to be accepted. Hence on average, benign users have a 
higher outgoing requests accepted ratio than Sybil identi¬ 
ties. Similarly, we model the outgoing requests accepted 
ratio for a node v as 


benign users tend to have a higher outgoing requests ac¬ 
cepted ratio, a lower incoming requests accepted ratio, 
and a higher coefficient. Besides, only using any one of 
the three features is not able to obtain a clear decision 
boundary for the classification. Therefore, we need to 
leverage these three features together by building a ma¬ 
chine learning classifier. 

6.3.2 Training a SVM classifier 

We adopt the LIBSVM [321 tool to build a Support Vector 
Machine (SVM) 1^ classifier. We sample a training set 
comprising 10,000 benign nodes and 10,000 Sybil nodes, 
and use the remaining nodes as testing. We train a SVM 
classifier with RBF kernel, whose parameters c and 7 are 
obtained from Grid Search. The overall prediction accu¬ 
racy is 90.5%, with 9.4% FPR and 31.8% FNR. Consid¬ 
ering the fact that half of Sybil nodes are isolated, it is 
essentially hard for previous approaches to detect more 
than half of total Sybil nodes. Thus, the 68.2% Sybil 
nodes detection capability of SybilFrame is impressive. 

Some applications may require a lower FPR. A natural 
way is to assign a higher penalty term to the benign class 
and a lower penalty term to Sybil class. In this way, we 
can reduce the FPR of our node classifier to be 8.5% with 
41.6% FNR. 

6.3.3 Output node priors 

To output priors, LIBSVM has an internal scheme to al¬ 
low for probability estimates by fitting a logistic curve 
and conducting a cross validation procedure (321. We 
can use the same parameters obtained from grid search, 
and enable the probability outputs. These output scores 
are then used as node priors for SybilFrame. 




|r/^(v) T\Tout iy^ I 
|r.„,(v)| 


( 8 ) 


where |r(v)| denotes the cardinality of the set r(v). 

3) Clustering coefficient: The clustering coefficient 
for a vertex is a graph metric that measures how close its 
neighbors are to being a complete graph. For a node v 
on the undirected graph G = (V,F), its local clustering 
coefficient (^ is given as. 


2\{{ij):ijeV,{ij)eE}\ 

ky{ky - 1 ) 


(9) 


where ky is the degree of v, i and j are both friends of 
V. The insight is that benign users tend to have well- 
connected social cliques, and users in such cliques share 
some attributes in common and are likely to be friends 
themselves. Therefore, benign users are likely to have a 
higher clustering coefficient than Sybil identities. 

Figure shows the scatter plot of the outgoing re¬ 
quests accepted ratio versus incoming requests accepted 
ratio, as well as the CDF plot for the clustering coeffi¬ 
cient for benign nodes and Sybil nodes. As expected. 


6.4 Computing Edge Priors 

We now explore ways to compute edge priors. As dis¬ 
cussed in Section 5.3 we can leverage well-known sim¬ 
ilarity metrics. For each edge (w,v) e E on graph G = 
(V^E), we compute the following similarity metrics: 

Number of Common Neighbors (23) 

Suv — 

|r(M)nr(v)| 

Cosine Similarity Index (24l Suv = 

v %% 

Jaccard Similarity Index |l23l Suv = jr(M)ur(v)[ 
Adamic-Adar Similarity Index m = 

'Lser{u)nr{v) 

Following a similar procedure, we scale the features 
and train a RBF-SVM classifier. As a result, we can suc¬ 
cessfully detect 18% attack edges, with FPR 10%. To 
improve the performance, we may include more com¬ 
plex similarity metrics, i.e. Katz Index and Leicht- 
Holme-Newman Index (24l, which may cost a longer 
time to compute. Another way is to use node priors to in¬ 
fer edge priors. Generally, for an edge whose end nodes 
have different predicted labels, we can assign a lower 
score to indicate a higher possibility to be an attack edge; 
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otherwise, we set the score to be default 0.9 to model ho- 
mophily. Since our node priors work much better than 
edge priors, we adopt this inference procedure. 

6.5 Scalable Implementation 

We adopt the GraphLah parallel framework 1331 to im¬ 
plement Loopy Belief Propagation in parallel. The par¬ 
allel framework distributes nodes to multiple processors, 
and each processor passes and updates messages for the 
nodes that are assigned to. Essentially, computing node 
priors and edge priors is also parallelizable. 

6.6 Results 

We now present our experimental results. We compare 
SybilFrame with SybilBelief in terms of detection accu¬ 
racy, FPR and FNR. We compare with SybilBelief and 
SybilRank in terms of relative ranking of Sybil nodes. If 
Sybil nodes tend to rank before benign nodes, OSN oper¬ 
ators can leverage crowdsourcing (i.e. Amazon Mechan¬ 
ical Turks l34l ) to manually screen and label suspicious 
accounts. Since SybilLimit and Sybilinfer do not scale 
to large datasets, we do not compare with them. 

Detection results: We randomly select 1000 benign and 
Sybil seeds, and run SybilBelief and SybilFrame. Table[2 
shows the results. We draw several conclusions: 1) Due 
to large number of attack edges, SybilBelief predicts all 
nodes to be Sybil thus completely losing detection ca¬ 
pabilities. (We validated the implementation and results 
with the authors of SybilBelief.) 2) Node prior classifier 
of SybilFrame has certain detection power, which is able 
to detect 68.2% Sybil nodes at maximum. By assign¬ 
ing different penalty terms, the FPR can be reduced to 
8.5% (Node classifier - II). 3) Incorporating node priors 
into SybilFrame can reduce FPR to 4.2%. (SybilFrame 
- II), and achieve a better accuracy 95.4%. 4) Since we 
label our ground truth based on whether the accounts are 
suspended by Twitter, it is possible that Twitter fails to 
detect some Sybil accounts, which are labeled by Sybil¬ 
Frame as positive examples. Thus, the true FPR should 
be lower than our estimates. 5) Considering that half 
of Sybil nodes are isolated, the detection capability of 
68.2% Sybil accounts is impressive. 

Table 1: Detection Performance on Twitter 



Accuracy 

FPR 

FNR 

SybilBelief 

0.7% 

99.3% 

0.00 

Node classifier 

90.5% 

9.4% 

31.8% 

SybilFrame 

91.8% 

8.0% 

33.5% 

Node classifier - II 

91.2% 

8.5% 

41.6% 

SybilFrame - II 

95.4% 

4.2% 

48.9% 


Ranking results: We rank the posteriors score gener¬ 
ated from SybilFrame, as well as scores of SybilBelief 
and SybilRank, in ascending order. We then compute the 


portion of Sybil identities in the IK, lOK, 50K, lOOK, 
IM and lOM lowest-ranked users. Figure shows 



Lowest-ranked users 

Figure 15: Portion of Sybil identities 

the results of four schemes: random guess, SybilRank, 
SybilBelief and SybilFrame. We draw several conclu¬ 
sions: 1) In the first IK users, SybilFrame is able to rank 
over 500 Sybil accounts, 12 times better than SybilBe¬ 
lief, 35 times better than SybilRank and 72 times better 
than random guess. 2) There exists a significant descend¬ 
ing trend of portions in SybilFrame, while SybilRank 
and SybilBelief do not have such trend. This means that 
SybilFrame has much more power to rank Sybil nodes 
in the top part of the ranking list, while SybilRank and 
SybilBelief roughly distribute the Sybil accounts evenly. 
3) Given the same amount of time and human resource, 
OSN operators can use SybilFrame to detect more Sybil 
nodes than using SybilRank or SybilBelief. 

Problem with Twitter’s detection policy: Recall that 
we obtained our ground truth based on whether the ac¬ 
count was active or suspended by Twitter. Thus, it is 
possible that some accounts are actually Sybil but evade 
Twitter’s detection policy. To test this, first we re-crawl 
the top IK accounts, and find that 7 additional accounts 
have been suspended by Twitter since our first crawl. 
Next, we manually examine the top 100 accounts, of 
which 71 are suspended and 29 are active. We examine 
the profile of each of the 29 active accounts, and find that 
only 3 accounts are likely to be real, with a long time¬ 
line and diverse tweets. Besides, 24 accounts are highly 
likely to be fake, with common characteristics such as 
same account images and few tweets. Furthermore, most 
of their tweets are published around 7/5/2009, and they 
all contain URLs and are about making money. Thus, we 
suspect that these 24 accounts were created by attackers 
and belong to the Sybil category. The remaining 2 ac¬ 
counts have less than two tweets and a protected profile, 
which are marked as suspicious. We give a complete list 
of these 29 active accounts in Appendix |9.4| 

From the above analysis, we conclude that: 1) Twit¬ 
ter’s Sybil detection policy is not optimal. 2) SybilFrame 
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is able to uncover a large fraction (24/29) of suspicious 
accounts that Twitter fails to detect. Hence, the true FPR 
of SybilFrame should be lower than our estimates. 

6.7 Summary 

In this section, we discussed ways to compute priors 
and implemented SybilFrame in parallel. We evaluated 
SybilFrame on real-world, large-scale Twitter network, 
and have following observations: 

1) In terms of detection performance, SybilFrame per¬ 
forms orders of magnitudes better than SybilBelief. Even 
when the dataset is noisy and the number of attack 
edges is large, SybilFrame can detect 68.2% Sybil nodes 
at maximum. By tuning parameters, SybilFrame can 
achieve 4.2% FPR with 51% detection rate. 

2) In terms of ranking performance, SybilFrame per¬ 
forms orders of magnitudes better than SybilBelief and 
SybilRank. Among the first IK users, SybilFrame is able 
to successfully rank 552 Sybil accounts, 12 times better 
than SybilBelief and 35 times better than SybilRank. 

3) SybilFrame is able to uncover large fraction of sus¬ 
picious accounts that Twitter fails to detect. 

7 Discussion 

Defense-in-depth: We discuss the resilience of our ap¬ 
proach to attackers that aim to mimic the features we use 
in Stage 1. Specifically for Twitter experiment, if the at¬ 
tacker wants to mimic the features and let more Sybils 
bypass the node classifier, he/she needs to control Sybil 
identities to establish more connections between them¬ 
selves and form Sybil clusters, in order to have a lower 
Reqin, a higher Reqout and a higher clustering coefficient. 
As a result, Sybils will be much more densely connected, 
and the edge classifier in Stage 1 together with LBP in 
Stage 2 will be more effective to detect them. This is 
the basic idea of SybilFrame’s multi-layered protection 
and defense-in-depth. Also, it remains to be discussed 
whether the attacker wants to spend time in performing 
such a complex strategy, which consumes both a lot of 
time and resource. In our Twitter experiment, we found 
that Sybil identities were often less intelligent (e.g. half 
of them are isolated and they share common character¬ 
istics as discussed in Section [63| ), which makes it very 
easy for a human expert to identify them. However, Twit¬ 
ters fails to detect a significant fraction of Sybils but 
SybilFrame is able to uncover them. 

Furthermore, we recall that although we collected 
structural features only to evaluate SybilFrame, Sybil¬ 
Frame is an open framework that is able to incorporate 
content information. Similarly, we can extract content 
features of each node and edge, and build a content-based 
classifier, or even combine structural and content features 
together to build a more powerful general classifier. 

Lower FPR: Our experiment considers suspended ac¬ 


counts in Twitter as a ground truth for Sybil identities. 
Correspondingly, accounts that were not suspended were 
labeled as benign accounts. We note that this evaluation 
is conservative: our analysis considers accounts that are 
labeled as malicious by SybilFrame, but not suspended 
by Twitter as false positives. It is very well possible that 
these labeled false positives are actually malicious, how¬ 
ever not detected by Twitter. As experimented in Sec¬ 
tion 6.6 Twitter’s current detection policy is far from op¬ 
timal, and there is a large fraction of malicious accounts 
that Twitter fails to suspend. Therefore, our 4.2% FPR 
should be essentially lowered. 


Sybil detection capability in Twitter: Recall that 
SybilFrame was able to detect 66.5% Sybil identities (Ta¬ 
ble SybilFrame). By tuning parameters, SybilFrame 
was able to reduce FPR to 4.2% while still detecting 51% 
Sybil identities (Table SybilFrame - II). We believe 
that this result is close to the optimal that any structure- 
based approach could achieve. On the Twitter graph, half 
of the Sybil identities form a connected component, and 
the remaining half are isolated and only connect to be¬ 
nign nodes. Since previous structure-based approaches 
are mostly based on detecting local communities, they 
are limited in their ability to detect those isolated Sybil 
nodes. 


Broader applicability: Our approach of defense-in¬ 
depth, and using a multi-stage classification framework 
that is able to incorporate prior information about nodes 
and edges has broad applicability for network security. 
For example, the area of botnet detection can benefit 
from similar techniques that combine host-level informa¬ 
tion with network structure-based information. 


8 Conclusion 

In this paper, we proposed SybilFrame, a defense-in¬ 
depth framework, for structure-based Sybil detection in 
online social systems. SybilFrame uses a multi-stage 
classification mechanism, which is able to incorporate 
heterogeneous sources and types of information about 
the social network. By leveraging the fine grained lo¬ 
cal information about users and edges, SybilFrame trans¬ 
forms local information into beliefs of labels, and then 
propagate those beliefs to make collective inferences. 

We experimentally evaluated the accuracy of our ap¬ 
proach using both synthetic and real-world social net¬ 
work topologies. We evaluated SybilFrame on a large- 
scale Twitter dataset. Our results demonstrate that Sybil¬ 
Frame is resilient to high number of attack edges, and 
performs an order of magnitude better than previous 
structure-based approaches. 

Future work includes collecting and incorporating lo¬ 
cal content information, and enforcing more fine grained 
control on the belief propagation rules. 
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9 Appendix 

9.1 Prior Generator 


Node prior generator: Algorithm gives our Node 
Prior Generator for experiments on synthetic graphs in 
Section p~T| We randomly generate node priors based on 
FPR and FNR. 


Algorithm 3: Edge Prior Generator 
Data: nodes with true labels, set of trust seeds, FPR 
and FNR 

Result: priors of all edges 
for each edge (t/,v) do 

if both u and v are trust seeds then 
if u and v have different labels then 
I Set Pn6>rj^ V = 0.1 
else 

1^ Set Prior = 0.9 

else 

if u and v have different labels then 
i = randDouble{0^ 1) 

if i < FNR then 

I Priory = randDouble{0.5,0.9) 

else 

^ Priory = randDouble{0.1,0.5) 

else 

i = randDouble{0, 1) 

if i < FPR then 

I Priory = randDouble {0.1,0.5) 

else 

^ Priory = randDouble{0.5,0.9) 


Algorithm 2: Node Prior Generator 
Data: nodes with true labels, set of trust seeds, FPR 
and FNR 

Result: priors of all nodes 
for each node do 

if V is a benign trust seed then set Priory = 0.9 
else if V is a Sybil trust seed then set 
Priory = 0.1 else 

if the true label ofv is benign then 
i = randDouble {0, 1) 
if i < FPR then 

I Priory = randDouble {0.1,0.5) 

else 

^ Priory = randDouble{0.5,0.9) 

else 

i = randDouble {0, 1) 

if i < FNR then 

I Priory = randDouble {0.5,0.9) 

else 

^ Priory = randDouble{0.1,0.5) 


Edge prior generator: Algorithm gives our Edge 
Prior Generator for experiments on synthetic graphs in 
Section p3| We randomly generate edge priors based on 
FPR and FNR. 


9.2 Experiments on the Influence of Edge 
Priors 


As in Section [4^ we set FPR = 0.1 and tune FNR from 
0 to 0.5, and fix everything else in the basic setup. Fig¬ 
ure 16 shows the results of SybilFrame and SybilBelief. 


We can see that SybilFrame has good performance and 
outperforms SybilBelief even when FNR is 0.5. 


S’0.6 
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♦ SybilBelief 
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♦ SybilBelief 

♦ SybilFrame 



(c) Rejected benign nodes (d) Accepted Sybil nodes 
Figure 16: Set FPR=0.1 and vary FNR (edge prior) 
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(a) AUC (node prior) (b) AUC (edge prior) 


2) Benign users tend to have a long timeline with di¬ 
verse tweets. 

3) Sybil identities have much more following users 
than their followers. Besides, Sybil identities have few 
tweets (e.g. mostly less than 5) and a short timeline (e.g. 
around 7/5/09). Most of their tweets are about making 
money and contain URLs. Furthermore, Sybil identities 
tend to share common account images. 


Figure 17: AUC of SybilFrame under seed targeting attacks, 
(a) Given node priors, (b) Given edge priors. 




(a) Rejected benign nodes (b) Rejected benign nodes 
(node prior) (edge prior) 



(c) Accepted Sybil nodes (node(d) Accepted Sybil nodes (edge 
prior) prior) 


Figure 18: Performance of SybilFrame under seed targeting 
attacks. 


9.3 Experiments on Seed Targeting At¬ 
tacks 

As in Section |4.3[ we evaluate SybilFrame under seed 
targeting attacks. Figure [T^ shows the AUC as a func¬ 
tion of the number of attack edges for four scenario com¬ 
binations of trust seeds, in the node prior experiment 
(FPR=FNR=0.3) and edge prior experiment (FPR=0.1, 
FNR=0.5). Figure shows results for the number of 
rejected benign nodes and the number of accepted Sybil 
nodes. 


9.4 Complete List of 29 Active Accounts 

Table|^gives a complete list of 29 active accounts among 
the top 100 ranked accounts. For simplicity, we use 
pseudo names for account images; accounts sharing the 
same image name have the same account image. We have 
the following observations: 

1) Among 29 active accounts, 3 are benign users, 2 are 
suspicious and 24 are Sybil identities. 
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